    Active and assisted living ecosystem for the elderly

    A novel ecosystem to promote the physical, emotional and psychic health and well-being of the elderly is presented. Our proposal was designed to add several services developed to meet the needs of the senior population, namely services to improve social inclusion and increase contribution to society. Moreover, the solution monitors the vital signs of elderly individuals, as well as environmental parameters and behavior patterns, in order to seek eminent danger situations and predict potential hazardous issues, acting in accordance with the various alert levels specified for each individual. The platform was tested by seniors in a real scenario. The experimental results demonstrated that the proposed ecosystem was well accepted and is easy to use by seniors

    Improving pipelining tools for pre-processing data

    The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features.Agencia Estatal de Investigación | Ref. TIN2017-84658-C2-1-RXunta de Galicia | Ref. ED481D-2021/024Xunta de Galicia | Ref. ED431C2018/55-GR

    Extracción e preprocesamento de opinións sobre o sector enoturístico na provincia de Ourense

    A análise das opinións ou dos comentarios que o usuariado fai na rede é unha práctica que está en pleno auxe entre as empresas e as institucións debido ás repercusións que as valoracións expresadas poden ter sobre os produtos ou os servizos que ofertan, pero tamén porque está recibindo moita atención por parte da comunidade científica polo amplo abano de aplicacións nas que está sendo empregada. Analizar de forma manual o impacto xerado polos comentarios sobre unha empresa, produto ou sector consome moito tempo e esforzo, polo que se desenvolveron numerosas técnicas de extracción automatizada do contido das páxinas web, que normalmente se presenta de forma desestruturada, así como numerosas técnicas de preprocesamento que facilitan a súa análise posterior. Neste senso, o presente traballo ten o obxectivo de empregar a técnica de web scraping, co fin de extraer a información e os comentarios dispoñibles nas webs das adegas das catro denominacións de orixe (DO) da provincia de Ourense (DO do Ribeiro, DO de Valdeorras, DO de Monterrei e DO da Ribeira Sacra). Con posterioridade, a partir dos comentarios obtidos, levarase a cabo un proceso de análise de sentimentos, co obxectivo de achegar máis información ao estudo estatístico seguinte dos datos recuperados da web e poder obter así unha valoración dixital sobre o sector enoturístico da provincia que axude a tomar decisións futuras

    Alnus airborne pollen trends during the last 26 years for improving machine learning-based forecasting methods

    Black alder (Alnus glutinosa (L.) Gaertn.) is a species of tree widespread along Europe and belongs to mixed hardwood forests. In urban environments, the tree is usually located along watercourses, as is the case in the city of Ourense. This taxon belongs to the betulaceae family, so it has a high allergenic potential in sensitive people. Due to the high allergenic capacity of this pollen type and the increase in global temperature produced by climate change, which induces a greater allergenicity, the present study proposes the implementation of a Machine Learning (ML) model capable of accurately predicting high-risk periods for allergies among sensitive people. The study was carried out in the city of Ourense for 28 years and pollen data were collected by means of the Hirst trap model Lanzoni VPPS-2000. During the same period, meteorological data were obtained from the meteorological station of METEOGALICIA in Ourense. We observed that Alnus airborne pollen was present in the study area during winter months, mainly in January and February. We found statistically significant trends for the end of the main pollen season with a lag trend of 0.68 days per year, and an increase in the annual pollen integral of 112 pollen grains per year and approximately 12 pollen grains/m3 per year during the pollen peak. A Spearman correlation test was carried out in order to select the variables for the ML model. The best ML model was Random Forest, which was able to detect those days with medium and high labels.Xunta de Galicia | Ref. ED431C 2022/03-GRCXunta de Galicia | Ref. CO-0034-2021 00V

    Detección automática de momentos de risco alérxico da poboación ourensá

    Na actualidade, o número de persoas que presentan reaccións alérxicas ao pole aumentou considerablemente, polo que é interesante contar con mecanismos que permitan determinar, coa maior precisión posible, a cantidade de pole que estará presente na atmosfera e reducir, deste xeito, o seu impacto na poboación. Para predicir a concentración de pole realizáronse estudos que utilizan modelos de regresión lineal e que, posteriormente, evolucionaron cara a modelos automáticos ou de aprendizaxe profunda. A pesar da aplicación idónea destes modelos para predicir a concentración de pole, os resultados obtidos dependen en gran medida da existencia de medicións previas de concentración e están influenciados pola calidade dos datos dispoñibles. A investigación conxunta das disciplinas de botánica e de informática trata de realizar unha estimación do risco de alerxias polo pole, de forma que permita a administración de antihistamínicos con anterioridade á súa exposición, posto que está demostrado que é moito máis efectiva ca unha vez aparecidos os primeiros síntomas. En concreto, esta estimación fíxose sobre Alnus, Betula, Platanus, Poaceae e Urticaceae, os cinco tipos de pole considerados máis agresivos na provincia de Ourense. O grupo de investigación da disciplina de botánica encargouse da captación de datos de concentración de pole, normalización e representación dos valores de recollida, calculou a estación polínica principal para cada tipo de pole e propuxo un calendario polínico para a cidade de Ourense. E o grupo de investigación de Informática centrouse na análise dos datos proporcionados e na comparación de diferentes técnicas de aprendizaxe automática para clasificar as concentracións de pole na atmosfera da provincia de Ourense e para facilitar a toma de decisións. Neste traballo móstrase a experimentación unicamente co tipo de pole Alnus; é de esperar que tamén será adecuada para cada un dos outros tipos de pole, adaptando en cada caso o modelo máis axeitado

    GC4S: A bioinformatics-oriented Java software library of reusable graphical user interface components

    Modern bioinformatics and computational biology are fields of study driven by the availability of effective software required for conducting appropriate research tasks. Apart from providing reliable and fast implementations of different data analysis algorithms, these software applications should also be clear and easy to use through proper user interfaces, providing appropriate data management and visualization capabilities. In this regard, the user experience obtained by interacting with these applications via their Graphical User Interfaces (GUI) is a key factor for their final success and real utility for researchers. Despite the existence of different packages and applications focused on advanced data visualization, there is a lack of specific libraries providing pertinent GUI components able to help scientific bioinformatics software developers. To that end, this paper introduces GC4S, a bioinformatics-oriented collection of high-level, extensible, and reusable Java GUI elements specifically designed to speed up bioinformatics software development. Within GC4S, developers of new applications can focus on the specific GUI requirements of their projects, relying on GC4S for generalities and abstractions. GC4S is free software distributed under the terms of GNU Lesser General Public License and both source code and documentation are publicly available at http://www.sing-group.org/gc4s.Xunta de Galicia | Ref. ED481B 2016/068-

    Using natural language preprocessing architecture (NLPA) for Big Data text sources

    During the last years, big data analysis has become a popular means of taking advantage of multiple (initially valueless) sources to find relevant knowledge about real domains. However, a large number of big data sources provide textual unstructured data. A proper analysis requires tools able to adequately combine big data and text-analysing techniques. Keeping this in mind, we combined a pipelining framework (BDP4J (Big Data Pipelining For Java)) with the implementation of a set of text preprocessing techniques in order to create NLPA (Natural Language Preprocessing Architecture), an extendable open-source plugin implementing preprocessing steps that can be easily combined to create a pipeline. Additionally, NLPA incorporates the possibility of generating datasets using either a classical token-based representation of data or newer synset-based datasets that would be further processed using semantic information (i.e., using ontologies). This work presents a case study of NLPA operation covering the transformation of raw heterogeneous big data into different dataset representations (synsets and tokens) and using the Weka application programming interface (API) to launch two well-known classifiers.Xunta de Galicia | Ref. ED481B 2017/018Agencia Estatal de Investigación | Ref. TIN2017-84658-C2-1-

    Enhancing representation in the context of multiple-channel spam filtering

    This study addresses the usage of different features to complement synset-based and bag-of-words representations of texts in the context of using classical ML approaches for spam filtering (Ferrara, 2019). Despite the existence of a large number of complementary features, in order to improve the applicability of this study, we have selected only those that can be computed regardless of the communication channel used to distribute content. Feature evaluation has been performed using content distributed through different channels (social networks and email) and classifiers (Adaboost, Flexible Bayes, Naïve Bayes, Random Forests, and SVMs). The results have revealed the usefulness of detecting some non-textual entities (such as URLs, Uniform Resource Locators) in the addressed distribution channels. Moreover, we also found that compression properties and/or information regarding the probability of correctly guessing the language of target texts could be successfully used to improve the classification in a wide range of situations. Finally, we have also detected features that are influenced by specific fashions and habits of users of certain Internet services (e.g. the existence of words written in capital letters) that are not useful for spam filtering.Financiado para publicación en acceso aberto: Universidade de Vigo/CISUGXunta de Galicia | Ref. ED481D-2021/024Agencia Estatal de Investigación | Ref. TIN2017-84658-C2-1-